223 research outputs found

    New Classification and Generative Model for Medical Visual Question Answering

    Get PDF
    Medical images are playing an important role in the medical domain. A mature medical visual question answering system can aid diagnosis, but there is no satisfactory method to solve this comprehensive problem so far. Considering that there are many different types of questions, we propose a model called CGMVQA, including classification and answer generation capabilities to turn this complex problem into multiple simple problems in this paper. We adopt data augmentation on images and tokenization on texts. We use pre-trained ResNet152 to extract image features and add three kinds of embeddings together to deal with texts. We reduce the parameters of the multi-head self-attention transformer to cut the computational cost down. We adjust the masking and output layers to change the functions of the model. This model establishes new state-of-the-art results: 0.640 of classification accuracy, 0.659 of word matching and 0.678 of semantic similarity in ImageCLEF 2019 VQA-Med data set. It suggests that the CGMVQA is effective in medical visual question answering and can better assist doctors in clinical analysis and diagnosis

    WRGAN : Improvement of RelGAN with Wasserstein Loss for Text Generation

    Get PDF
    Generative adversarial networks (GANs) were first proposed in 2014, and have been widely used in computer vision, such as for image generation and other tasks. However, the GANs used for text generation have made slow progress. One of the reasons is that the discriminator’s guidance for the generator is too weak, which means that the generator can only get a “true or false” probability in return. Compared with the current loss function, the Wasserstein distance can provide more information to the generator, but RelGAN does not work well with Wasserstein distance in experiments. In this paper, we propose an improved neural network based on RelGAN and Wasserstein loss named WRGAN. Differently from RelGAN, we modified the discriminator network structure with 1D convolution of multiple different kernel sizes. Correspondingly, we also changed the loss function of the network with a gradient penalty Wasserstein loss. Our experiments on multiple public datasets show that WRGAN outperforms most of the existing state-of-the-art methods, and the Bilingual Evaluation Understudy(BLEU) scores are improved with our novel method

    Bert4CMR: Cross-Market Recommendation with Bidirectional Encoder Representations from Transformer

    Full text link
    Real-world multinational e-commerce companies, such as Amazon and eBay, serve in multiple countries and regions. Obviously, these markets have similar goods but different users. Some markets are data-scarce, while others are data-rich. In recent years, cross-market recommendation (CMR) has been proposed to enhance data-scarce markets by leveraging auxiliary information from data-rich markets. Previous works fine-tune the pre-trained model on the local market after freezing part of the parameters or introducing inter-market similarity into the local market to improve the performance of CMR. However, they generally do not consider eliminating the mutual interference between markets. Therefore, the existing methods are neither unable to learn unbiased general knowledge nor efficient transfer reusable information across markets. In this paper, we propose a novel attention-based model called Bert4CMR to simultaneously improve all markets' recommendation performance. Specifically, we employ the attention mechanism to capture user interests by modelling user behavioural sequences. We pre-train the proposed model on global data to learn the general knowledge of items. Then we fine-tune specific target markets to perform local recommendations. We propose market embedding to model the bias of each market and reduce the mutual inference between the parallel markets. Extensive experiments conducted on seven markets show that our model is state-of-the-art. Our model outperforms the suboptimal model by 4.82%, 4.73%, 7.66% and 6.49% on average of seven datasets in terms of four metrics, respectively. We conduct ablation experiments to analyse the effectiveness of the proposed components. Experimental results indicate that our model is able to learn general knowledge through global data and shield the mutual interference between markets

    Knowledge Graph based Question and Answer System for Cosmetic Domain

    Get PDF
    With the development of E-commerce, the requirements of customers for products become more detailed, and the workload of customer service consultants will increase massively. However, the manufacturer is not obliged to provide specific product ingredients on the website. Therefore, it is necessary to construct a KBQA system to relieve the pressure of online customer service and effectively help customers to find suitable skincare production. For the cosmetic filed, the different basic cosmetics may have varied effects depending on its ingredients. In this paper, we utilize CosDNA website and online cosmetic websites to construct a cosmetic product knowledge graph to broaden the relationship between cosmetics, ingredients, skin type, and effects. Besides, we build the question answering system based on the cosmetic knowledge graph to allow users to understand product details directly and make the decision quickly

    Intention Detection Based on Siamese Neural Network With Triplet Loss

    Get PDF
    Understanding the user's intention is an essential task for the spoken language understanding (SLU) module in the dialogue system, which further illustrates vital information for managing and generating future action and response. In this paper, we propose a triplet training framework based on the multiclass classification approach to conduct the training for the intention detection task. Precisely, we utilize a Siamese neural network architecture with metric learning to construct a robust and discriminative utterance feature embedding model. We modified the RMCNN model and fine-tuned BERT model as Siamese encoders to train utterance triplets from different semantic aspects. The triplet loss can effectively distinguish the details of two input data by learning a mapping from sequence utterances to a compact Euclidean space. After generating the mapping, the intention detection task can be easily implemented using standard techniques with pre-trained embeddings as feature vectors. Besides, we use the fusion strategy to enhance utterance feature representation in the downstream of intention detection task. We conduct experiments on several benchmark datasets of intention detection task: Snips dataset, ATIS dataset, Facebook multilingual task-oriented datasets, Daily Dialogue dataset, and MRDA dataset. The results illustrate that the proposed method can effectively improve the recognition performance of these datasets and achieves new state-of-the-art results on single-turn task-oriented datasets (Snips dataset, Facebook dataset), and a multi-turn dataset (Daily Dialogue dataset)

    Emotion Expression Extraction Method for Chinese Microblog Sentences

    Get PDF
    With the rapid spread of Chinese microblog, a large number of microblog topics are being generated in real-time. More and more users pay attention to emotion expressions of these opinionated sentences in different topics. It is challenging to label the emotion expressions of opinionated sentences manually. For this endeavor, an emotion expression extraction method is proposed to process millions of user-generated opinionated sentences automatically in this paper. Specifically, the proposed method mainly contains two tasks: emotion classification and opinion target extraction. We first use a lexicon-based emotion classification method to compute different emotion values in emotion label vectors of opinionated sentences. Then emotion label vectors of opinionated sentences are revised by an unsupervised emotion label propagation algorithm. After extracting candidate opinion targets of opinionated sentences, the opinion target extraction task is performed on a random walk-based ranking algorithm, which considers the connection between candidate opinion targets and the textual similarity between opinionated sentences, ranks candidate opinion targets of opinionated sentences. Experimental results demonstrate the effectiveness of algorithms in the proposed method

    Data-Driven Channel Pruning towards Local Binary Convolution Inverse Bottleneck Network Based on Squeeze-and-Excitation Optimization Weights

    Get PDF
    This paper proposed a model pruning method based on local binary convolution (LBC) and squeeze-and-excitation (SE) optimization weights. We first proposed an efficient deep separation convolution model based on the LBC kernel. By expanding the number of LBC kernels in the model, we have trained a larger model with better results, but more parameters and slower calculation speed. Then, we extract the SE optimization weight value of each SE module according to the data samples and score the LBC kernel accordingly. Based on the score of each LBC kernel corresponding to the convolution channel, we performed channel-based model pruning, which greatly reduced the number of model parameters and accelerated the calculation speed. The model pruning method proposed in this paper is verified in the image classification database. Experiments show that, in the model using the LBC kernel, as the number of LBC kernels increases, the recognition accuracy will increase. At the same time, the experiment also proved that the recognition accuracy is maintained at a similar level in the small parameter model after channel-based model pruning by the SE optimization weight value

    Hierarchical Network with Label Embedding for Contextual Emotion Recognition

    Get PDF
    Emotion recognition has been used widely in various applications such as mental health monitoring and emotional management. Usually, emotion recognition is regarded as a text classification task. Emotion recognition is a more complex problem, and the relations of emotions expressed in a text are nonnegligible. In this paper, a hierarchical model with label embedding is proposed for contextual emotion recognition. Especially, a hierarchical model is utilized to learn the emotional representation of a given sentence based on its contextual information. To give emotion correlation-based recognition, a label embedding matrix is trained by joint learning, which contributes to the final prediction. Comparison experiments are conducted on Chinese emotional corpus RenCECps, and the experimental results indicate that our approach has a satisfying performance in textual emotion recognition task
    • …
    corecore